775 research outputs found

    High-performance generalized tensor operations: A compiler-oriented approach

    Full text link
    The efficiency of tensor contraction is of great importance. Compilers cannot optimize it well enough to come close to the performance of expert-tuned implementations. All existing approaches that provide competitive performance require optimized external code. We introduce a compiler optimization that reaches the performance of optimized BLAS libraries without the need for an external implementation or automatic tuning. Our approach provides competitive performance across hardware architectures and can be generalized to deliver the same benefits for algebraic path problems. By making fast linear algebra kernels available to everyone, we expect productivity increases when optimized libraries are not available. © 2018 Association for Computing Machinery

    Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

    Full text link
    Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an in-depth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple machine models. The Roofline Model and the Execution-Cache-Memory (ECM) model are proven approaches to performance modeling of loop nests. In comparison to the Roofline model, the ECM model can also describes the single-core performance and saturation behavior on a multicore chip. We give an introduction to the Roofline and ECM models, and to stencil performance modeling using layer conditions (LC). We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis. The layer condition analysis allows to predict optimal spatial blocking factors for loop nests. Together with the models it enables an ab-initio estimate of the potential benefits of loop blocking optimizations and of useful block sizes. In cases where LC analysis is not easily possible, Kerncraft supports a cache simulator as a fallback option. Using a 25-point long-range stencil we demonstrate the usefulness and predictive power of the Kerncraft tool.Comment: 22 pages, 5 figure

    Mixed-data-model heterogeneous compilation and OpenMP offloading

    Get PDF
    Heterogeneous computers combine a general-purpose host processor with domain-specific programmable many-core accelerators, uniting high versatility with high performance and energy efficiency. While the host manages ever-more application memory, accelerators are designed to work mainly on their local memory. This difference in addressed memory leads to a discrepancy between the optimal address width of the host and the accelerator. Today 64-bit host processors are commonplace, but few accelerators exceed 32-bit addressable local memory, a difference expected to increase with 128-bit hosts in the exascale era. Managing this discrepancy requires support for multiple data models in heterogeneous compilers. So far, compiler support for multiple data models has not been explored, which hampers the programmability of such systems and inhibits their adoption. In this work, we perform the first exploration of the feasibility and performance of implementing a mixed-data-mode heterogeneous system. To support this, we present and evaluate the first mixed-data-model compiler, supporting arbitrary address widths on host and accelerator. To hide the inherent complexity and to enable high programmer productivity, we implement transparent offloading on top of OpenMP. The proposed compiler techniques are implemented in LLVM and evaluated on a 64+32-bit heterogeneous SoC. Results on benchmarks from the PolyBench-ACC suite show that memory can be transparently shared between host and accelerator at overheads below 0.7 % compared to 32-bit-only execution, enabling mixed-data-model computers to execute at near-native performance

    Towards an Achievable Performance for the Loop Nests

    Full text link
    Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC 2018

    Simultaneous magma and gas eruptions at three volcanoes in southern Italy: an earthquake trigger?

    Get PDF
    In September 2002, a series of tectonic earthquakes occurred north of Sicily, Italy, followed by three events of volcanic unrest within 150 km. On October 28, 2002, Mt. Etna erupted; on November 3, 2002, submarine degassing occurred near Panarea Island; and on December 28, 2002, Stromboli Island erupted. All of these events were considered unusual: the Mt. Etna NE-rift eruption was the largest in 55 yr, the Panarea degassing was one of the strongest ever detected there, and the Stromboli eruption, which produced a landslide and tsunami, was the largest effusive eruption in 17 yr. Here, we investigate the synchronous occurrence of these clustered unrest events, and develop a possible explanatory model. We compute short-term earthquake-induced dynamic strain changes and compare them to long-term tectonic effects. Results suggest that the earthquake-induced strain changes exceeded annual tectonic strains by at least an order of magnitude. This agitation occurred in seconds, and may have induced fluid and gas pressure migration within the already active hydrothermal and magmatic systems

    Проект узла синтеза бутилацетата

    Get PDF
    Объект разработки: производство бутилацетата методом этерификации с катализатором в виде серной кислоты. Цель работы: изучение физико – химических свойств процесса и их влияния на протекание реакции, конструирование основного аппарата синтеза бутилацетата. В результате исследования выполнен расчет материального и теплового балансов, конструктивный и механический расчеты, на основании которых был выполнен чертеж основного аппарата.Content words are esterification, feasibility study and another.The object of the development is the production of butylacetate by esterification catalyst in the form of sulfuric acid.The mission is the study of physical - chemical properties of the process and their influence on the reaction, as well as the construction of the main unit synthesis of butyl acetate. The study was carried out payment of material and heat balances, the constructive and mechanical calculations, drawing on the basis of which the main unit was made. The final qualifying work carried out at the Department of TOVPM student group 2D2A Marina Filippova, under the leadership of Candidate of Chemical Sciences Ann Manankova

    Removal of temporary pacemaker after cardiac surgery in infants: A harmless procedure?

    Get PDF
    External pacemakers (PM) via temporary epicardial leads are routinely applied to infants and children during heart surgery, which usually, after an uneventful post surgical course, can be removed without complications. We report about two infants with complex congenital heart defects after cardiac surgery (arterial switch and Mustard operation for Transposition of the great arteries). Intraoperative these patients received temporary epicardial PM wires. Thirteen and 18 days post surgery, respectively, the PM wires were removed under electrocardiogram (ECG) monitoring. The patients showed acute ECG changes in terms of significant ST elevation during and after removing their pacing wires. Clinically, patients were stable and subsequent echocardiographic examination showed no evidence of myocardial dysfunction or pericardial effusion. In the course of time, patients showed no signs of arrhythmia or abnormal ECG changes. The decision to place temporary pacing wires during the cardiac surgery in patients with congenital heart defects should be considered carefully and their removal should occur under ECG monitoring as soon as the situation of the patient allows. It should be taken into consideration that a complication like this case may be related to delayed removal of temporary PM’s leads. © 2012 - IOS Press and the authors

    An axiomatic approach to the non-linear theory of generalized functions and consistency of Laplace transforms

    Get PDF
    We offer an axiomatic definition of a differential algebra of generalized functions over an algebraically closed non-Archimedean field. This algebra is of Colombeau type in the sense that it contains a copy of the space of Schwartz distributions. We study the uniqueness of the objects we define and the consistency of our axioms. Next, we identify an inconsistency in the conventional Laplace transform theory. As an application we offer a free of contradictions alternative in the framework of our algebra of generalized functions. The article is aimed at mathematicians, physicists and engineers who are interested in the non-linear theory of generalized functions, but who are not necessarily familiar with the original Colombeau theory. We assume, however, some basic familiarity with the Schwartz theory of distributions.Comment: 23 page

    Nonsteroidal Anti-Inflammatory Drugs and Opioids in Postsurgical Dental Pain

    Get PDF
    Postsurgical dental pain is mainly driven by inflammation, particularly through the generation of prostaglandins via the cyclooxygenase system. Thus, it is no surprise that numerous randomized placebo-controlled trials studying acute pain following the surgical extraction of impacted third molars have demonstrated the remarkable efficacy of nonsteroidal anti-inflammatory drugs (NSAIDs) such as ibuprofen, naproxen sodium, etodolac, diclofenac, and ketorolac in this prototypic condition of acute inflammatory pain. Combining an optimal dose of an NSAID with an appropriate dose of acetaminophen appears to further enhance analgesic efficacy and potentially reduce the need for opioids. In addition to being on average inferior to NSAIDs as analgesics in postsurgical dental pain, opioids produce a higher incidence of side effects in dental outpatients, including dizziness, drowsiness, psychomotor impairment, nausea/vomiting, and constipation. Unused opioids are also subject to misuse and diversion, and they may cause addiction. Despite these risks, some dental surgical outpatients may benefit from a 1- or 2-d course of opioids added to their NSAID regimen. NSAID use may carry significant risks in certain patient populations, in which a short course of an acetaminophen/opioid combination may provide a more favorable benefit versus risk ratio than an NSAID regimen. © International & American Associations for Dental Research 2020
    corecore